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Abstract 

The paper considers probability distribution, density, conditional distri- 
bution and density and conditional moments as well as their kernel estimators 
in spaces of generalized functions. This approach does not require restrictions 
on classes of distributions common in nonparametric estimation. Density 
in usual function spaces is not well-posed; this paper establishes existence 
and well-posedness of the generalized density function. It also demonstrates 
root-n convergence of the kernel density estimator in the space of general- 
ized functions. It is shown that the usual kernel estimator of the conditional 
distribution converges at a parametric rate as a random process in the space 
of generalized functions to a limit Gaussian process regardless of pointwise 
existence of the conditional distribution. Conditional moments such as condi- 
tional mean are also be characterized via generalized functions. Convergence 
of the kernel estimators to the limit Gaussian process is shown to hold as 
long as the appropriate moments exist. 
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1 Introduction 



A probability distribution function, F, that corresponds to a Borel measure 
on a Euclidean space R k (or its subspace) is always defined in the space of 
bounded functions. It can be viewed as the right-hand side of an integral 
equation: 

Hf) = F; (1) 
where the density represents the solution to the inverse problem 

/ = d k F. (2) 

Here / represents an integration operator for R k : I (/) (x) = f* 1 ... f (w) dwi...dw k 
and d k = a 8 * fl — the inverse differentiation operator. 

When does the solution to the inverse problem exist? 

In the usual approach the integral operator / is assumed to operate on 
the space of integrable functions, e.g. L\ (absolutely integrable functions) 
or L 2 (square integrable functions), - see e.g. Devroye and Gyorfi (1985), 
Carrasco, Florens, Renault (2007). The operator / maps density functions 
in L\ into the space of absolutely continuous distribution functions. In this 
case the inverse operator d k is defined and the inverse problem has a unique 
solution. 

The property of well-posedness requires that the solution continuously 
depend on the right-hand side function, in other words, if distribution func- 
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tions are close, the corresponding densities should be close as well. However, 
in spaces of integrable functions the inverse problem is not well-posed: while 
the operator / is continuous on L\ (or another L p space) the inverse operator 
d k is not. The example below (from Zinde- Walsh, 2011) illustrates lack of 
well-posedness. 

Example. Consider the space D([0,1]) of univariate absolutely continu- 
ous distribution functions on the interval [0, 1] in the uniform metric: the dis- 
tance between two distributions, F±, F 2 is d(Fi,F 2 ) = max \Fi(x) — F 2 (x)\ ; 

xe[o,i] 

this is the image space of the operator /(•) defined on L\ ([0, 1]) . 

Denote by [v] the integer part of v, that is the largest integer that is < v. 
Let I (x G A) denote the indicator function of set A, that equals 1 if x is in 
A, zero otherwise. With e — | define densities 




f^x) = 2 I (x e[2me,(2m+l)s)); 

m=0 




f 2 (x) = 2 I(x e[(2m + l)e,(2m + 2)e)). 

m=0 

The densities f\ and f 2 have supports that do not intersect, it is easily 
seen that at each point they differ by 2: \fi (x) — f 2 (x)\ = 2; it follows that 
the Li([0, 1]) difference between them is 2. The corresponding distributions 
are F\ = I(fi) and F 2 = I(f 2 )- It is easy to establish by integration that 

max |Fi (x) — F 2 (x) I < 2e = e, 
xe[o,i] 
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and thus the inverse operator is not continuous. 

Thus although a solution to the inverse problem in the L\ space exists 
for absolutely continuous distributions, the problem is not well-posed. 

By contrast, in the appropriate space of generalized functions the solution 
to the density problem exists without any restrictions on the distribution 
function and is well-posed; as proved in section 2 below this follows from 
the known properties of generalized functions. The fact that generalized 
functions can be useful when non-differentiability prevents the use of Taylor 
expansions was discussed e.g. in Phillips (1991) for LAD estimation, and 
continued in some econometric literature that followed. 

The statistical inverse problem is solved often with a kernel density es- 
timator. Consider a random sample of observations from a distribution F, 
{xj}f =1 , Xi G R k . With a chosen kernel function, K and bandwidth (vector) 
h the estimator is 



the argument = ( ( x \ Xl ) , f Xik h Xk ) )• We shall proceed with the 



following assumption on the kernel. 
Assumption 1 (kernel). 

(a) . K(w) is an ordinary bounded function on R k ; J K(w)dw = 1; 

(b) . Support of K belongs to [— 1, l] k ; 




(3) 



where h has components hi 



, ...h k and K{ x± ^-) is a multivariate function with 
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(c). K(w) is an l—th order kernel: for w = (w±, ...Wk) the integral 



I 



w{ 1 ...w J k k K(w)dwi...dwk 



= if ji + ... +j k < I; 
< oo if ji + ... +jk= I- 



The finite support and boundedness assumptions can be relaxed and are 
introduced to simplify assumptions and derivations; K is not restricted to be 
symmetric or non-negative. 

Denote by K the integral of the kernel function, then 

F W = £*(5Z£) (4 ) 

i=l 

is an estimator of the distribution function, F(x). The properties of these 
estimators depend on K and h and are well established (Azzalini, 1981). 
Generally for h — > as n — > oo with nh — > oo, F(x) is a root-n consistent 
and asymptotically Gaussian estimator of F(x) at any point of continuity; 



F(x) - F(x) 



, converges to zero. 



the uniform norm of the difference, sup 

Known convergence properties of f(x) are more complicated; they rely 
on assumptions about the existence and smoothness of the density, f(x); 
the convergence rate is slower than root-n and depends on the order of the 
kernel and the rate of the bandwidth h — > (Pagan and Ullah, 1999). As 
shown in Examples 3-5 in Zinde- Walsh (2008), the estimator f(x) fails to 
converge pointwise if the distribution is not absolutely continuous (e.g. at a 
mass point or for a fractal measure); of course, in those cases density itself 
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cannot be denned pointwise and exists only as the solution, / in (J2J) to the 
inverse problem in the space of generalized functions. 

When considered in the space of generalized functions the estimators, 
/, are viewed as random continuous linear functionals on spaces of well- 
behaved functions where convergence to generalized derivatives of distribu- 
tion functions (solutions to the inverse problem) can be established without 
any assumptions on the underlying distribution. Moreover, convergence of 
kernel estimators can be faster and even at parametric rates. This result 
has features common to other results on convergence of random functionals 
of density as discussed, e.g. in Anderson et al (2012) and is derived here in 
section 3. This result relies on the derivation of the rate of bias in generalized 
functions that was provided in Zinde- Walsh (2008) but gives the derivation 
of the covariance functional that corrects the one in that paper. 

Conditioning is somewhat awkward and there are many different ways to 
streamline the representation of conditional measures and distribution func- 
tions (Chiang and Pollard, 1997, Pfanzagl, 1979 among others). Here we 
focus on the distribution function F (x, y) on R dx x R dy and distribution 
of y G R dy conditional on x G R d *. In this case typically the conditional 
distribution F y \ x function is represented via a fraction - -pBf^ ? where the 
differentiation operator is applied to the x argument of F (x, y) and f x (x) 
represents the density of the marginal distribution. Of course such a repre- 
sentation makes stringent requirements on the smoothness of the appropriate 
functions. Here the case of an arbitrary continuous conditioning distribution 
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is considered without requiring differentiability; it is shown that for this case 
the conditional distribution and conditional density have a straightforward 
representation as generalized functions on appropriate spaces. The repre- 
sentation is in terms of functionals involving the conditioning distribution 
(rather than the conditioning variable) as an argument; this representation 
avoids the nonlinearity introduced by the denominator. When the usual rep- 
resentation holds, a simple correspondence between the two representations 
is established. Conditional density, f y \ x is defiend as a generalized derivative 
of the conditional distribution generalized function. 

The convergence of the usual kernel estimator of the conditional distribu- 
tion is known under smoothness assumptions (Pagan and Ullah, 1999, Li and 
Racine, 2007) and utilizes the properties of the kernel density estimator; the 
density appears in the denominator of the statistic requiring some support 
assumptions and possibly regularization to converge. Here the root-n con- 
vergence of the kernel estimator to a limit Gaussian process in generalized 
function space is established without any extra restrictions on the distribu- 
tion. 

An interpretation of a conditional moment function is provided here in 
the space of generalized functions, thus again without any restriction beyond 
continuity of conditioning distribution. For estimators, such as for condi- 
tional mean kernel estimator the asymptotic properties are established, the 
result is then that root-n convergence in generalized functions obtains for 
the kernel estimator without any restrictions on smoothness of distribution 
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functions. 

The theoretical results of this paper extend the usual representation of 
the density, conditional distribution and density and conditional moments to 
situations where these may not exist in an ordinary sense. The advantage 
that this approach provides is its generality. On the other hand, the topology 
in the spaces of generalized functions is weak and well-posedness does not 
imply convergence in norm. 

The asymptotic results provide a general approach, so that when the 
usual assumptions may fail there is still a sense in which consistency holds. 
Moreover a root-n convergence rate obtains, again as a consequence of the 
weak topology with no guarantee of good convergence in norm. The practical 
advantage is in the possibility of utilizing the generalized random process 
and its limit process for inference without making any restrictions on the 
distribution. 

2 Density as solution to a well-posed inverse 
problem in the space of generalized func- 
tions 

For the definitions and results pertaining to spaces of generalized functions 
the main references are to books by Schwartz (1966) Gel'fand and Shilov 
(1964). A useful summary is in Zinde- Walsh (2008, 2012); the main defini- 
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tions follow. 

Consider a space of well-behaved "test" functions, (R k ) of infinitely 
differentiable functions with bounded support, or any of the spaces D m (R k ) 
of m times continuously differentiable functions (with bounded support); 
sometimes the domain of definition can be an open subset W of R k , typically 
here W = (0, l) k . Denote the generic space by D (W) ; convergence in D (W) 
is defined as follows: a sequence ip n e D (W) converges to zero if all ip n are 
defined on a common bounded support in W and ip n as well as all the I — th 
order derivatives (with I < m for D m or all / < oo for D^) converge pointwise 
to zero. The space of generalized functions is the dual space, D*, the space of 
linear continuous functionals on D (W) with the weak topology: a sequence of 
elements of D* converges if the sequence of values of the functionals converges 
for any test function from D (W) . The usual notation is to write the value 
of the functional / applied to a test function ip e D (W) as (/, ip) ; then a 
sequence /„ converges to / if for any tp convergence (/ n , ^) — ^ (/>V0 holds. 

Assume that functions in D (W) ; W C R k are suitably differentiable, 
e.g. at least k times continuously differentiable. Then for any ip e D (W) , 
and F e -D* define a generalized derivative f E D*; f = dx ® k Q Xk F as the 
functional with values given by: 

If the right-hand side is expressed via a regular locally summable function 
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as is the case when F is a probability distribution function, then it can be 
computed by integration: 

( F i a ~ t> 1 = /"•••/" F ( x ^ ■■■' x k) d t} Xl '"' Xk ^ dxi...dx k . 
\ dxi...dx k J J J dxi...dx k 

For the function F fl5]) the functional on the right-hand side defines the 
generalized derivative: f = a — . 

° J axi...axj c 

First consider density as a generalized function on the space (W) . 

Theorem 1. The inverse problem ([I]) for any cumulative probability 
distribution function F has the solution f defined by §5§ in the space of gen- 
eralized functions D* for (W). The problem is well-posed. When density 
exists as an integrable function, f(x), it provides the generalized function f 
via the value of the corresponding functional: 

Cf,V0 = J -J f(xi,...,x k )ip(x 1 ,...,x k )dx 1 ...dx k . (6) 

Proof. 

Any distribution function F on R k is a monotone bounded function and as 
such is locally integrable on any bounded set; a function like that represents a 
regular element in the space of generalized functions, D*, for (W) defined 
above. Then (jSJ) defines / as the generalized derivative of F, the generalized 
density function. 

The differentiation operator d k = ^~g^on the space of generalized func- 
tions D* is defined for any regular function and is a continuous operator 
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(Schwartz, p. 80). Thus the solution / continuously depends on F in these 
spaces providing well-posedness. 

If density / exists as a regular integrable function, its integral coincides 
with the function F and integration by parts of (jSJ) provides (jSJ) • Thus /, 
the solution to the inverse problem in the space D* is consistent with the 
solution when it exists as an ordinary function. ■ 

Corollary. The result of the Theorem applies in the space of generalized 
functions on D m (W) , m > k. 

Proof. 

Indeed, consider the space (W) C D k (W) . By the theorem the in- 
verse problem provides the density function / defined as a linear continuous 
functional on (W) via (J5]) . We can extend the functional / to D k (W) 
as a linear continuous functional. First note that since F is a regular locally 
integrable function it represents an element in D* k \ then define the functional 
in D* k by (jSJ) for any if) G Gk, denote it / to distinguish from / defined on 
Doo (W) ■ This / represents a linear continuous functional, so an element in 
D%. There is an injective mapping of linear topological spaces D* k — > 
(Sobolev, 1992 ; in notation there — > C*- 00 -'*), thus by this mapping / 
maps into / and the inverse problem is solved in D* k and is well-posed there 
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3 Gaussian limit process for the kernel den- 
sity estimator in the space of generalized 
functions 

We now describe the limit process for the kernel estimator as h = 
max/i,; — > with n — > oo, as a generalized random process. Such a de- 
scription was in Zinde- Walsh (2008), but there was an error in the variance 
computation that is corrected here. The main result here is that in the gen- 
eralized functions space convergence of the kernel density estimator can be 
at a parametric rate for a suitable selection of the kernel and bandwidth; 
unlike the usual case in the literature this selection alone provides the result 
independently of any properties (smoothness) of the distribution. 

Recall that convergence of generalized random functions is defined (see, 
e.g. Gel'fand and Vilenkin, 1964 or summary in Zinde- Walsh, 2008) as weak 
convergence of random linear continuous functionals on the space D. (for 
any of the D^, D^, etc. spaces here) that are indexed by the functions in D.: 
stochastic convergence of random functionals, /, follows from stochastic con- 
vergence of random vectors of values of the functional ^ (f, ip^j , (f, j 
for any finite set (ip 1 , ...,ip m ) with ^ G D,. Thus we need to consider the be- 
havior of such random vectors. 

Theorem 2 in Zinde- Walsh (2008) gives the convergence rate 0(h l ) for the 
generalized bias function of the kernel estimator based on a random sample 
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and the expression for the bias for ip e D i+k and kernel K of order I 



Ef-f*0(h), 



more specifically for any ip the bias functional provides (^Ef, ifj — (/, ip) 



where = o(/i'); if ip e A+fe+i then R(h) = 0(li +1 ). Note that = 
_E (■0) where expectation is with respect to the measure given by F. 
Denote the expression 

(_i)' y [ tt P^ZS ^ — tt(x)^ / kw<\..<^ 



by (B(h, K), ip) as it represents the value of a linear continuous functional 
B(h,K) applied to ip. The B(h,K) is the leading term in the generalized 
bias function for the kernel estimator: 

Bias^f) = Ef-f = h l B(h,K) + o(h l y, (8) 
where for any ip G Di +k +i 

= h l (B(h,K),iP) + o(h l ). 
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The following Theorem gives the limit process for the kernel estimator of 
density. 

Theorem 2. For a kernel function K satisfying Assumption A, if h — > 
and h 2l n = 0(1) as n — > oo the sequence of generalized random processes 
wk (f — f — h l B(h, K)\ converges to a generalized Gaussian process with 
mean functional zero and covariance functional C which for any ipi,ip 2 
Di + k provides 

(C, V 2 )) = E (iMx) - (x)} [ip 2 (x) - Ei> 2 x)}) = cov (i/> u if> 2 ) . (9) 

If nh 21 — > 0, then f — f converges at the parametric rate y/n to a generalized 
zero mean Gaussian process with covariance functional C in (Q . 
Proof. See appendix. 

The condition on the bandwidth that makes it possible to eliminate the 
bias asymptotically is less stringent than in the usual topologies and also 
than that originally stated in Zinde- Walsh (2008). Under this requirement 
on the bandwidth convergence is actually at a parametric rate and the limit 
covariance does not involve the kernel function. 

4 Distribution function conditional on some 
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variables and conditional density in the space 
of generalized functions 

Conditioning is an awkward operation as discussed e.g. in Chang and Pollard 
(1997). Here the question posed is limited to conditioning on a variable 
or vector in a joint distribution, that is given a joint distribution function 
F x>y (., .) on R dx x R dy define a (generalized) function F y \ x (., .) that represents 
the conditional distribution of y given x. A problem associated with such 
conditioning is that the conditional distribution function may not exist for 
every point x. 

Denote by F x , F y the marginal distribution functions of x, y, correspond- 
ingly. 

Consider limits of ratios to define conditioning: 



F x>y (x + A,y) — F x , y (x, y) 
aTo F x (x + A) - F XiV (x) 



F y]x = lim 1 ^:"^ (io) 



As discussed is numerous papers there is a problem defining such a limit 
(e.g. Pfazagle, 1979); here it will be demonstrated that the limit exists in 
a particular space of generalized functions. Assume that the distribution 
function F x is continuous; continuity of this distribution of course does not 
preclude singularity. 

Assumption 2. The marginal distribution function F x (x) is continuous 
on R dx . 
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Note that although support of the random y belongs to R dy it could be a 
discrete set of points, thus we do not restrict y to be continuously distributed. 

Consider the copula function (Sklar, 1973): C Fx>Fy (a,b) on W = (0, l) 2 
that is identical to the joint distribution function, that is for the mapping M : 
R dx x R dy — > W defined by {x, y} — > {F x (x), F y (y)} we get the corresponding 
mapping M*(F XiV (x,y)) = C M ( x ,y) (M(x,y)) with 

C M (x,y) (M(x, y)) = C FxjFy (F x (x), F y (y)) = F x>y (x, y) . 

Thus (TTDT) is equivalent to 

P r C Fx , Fy (F x (x + A), F y (y)) - C Fx , Fy (F x (x), F y (y)) 

"nlr = iim ; ; ; — ; \ 

Vl a^o F x (x + A)-F x>y (x) 

denote F x (x + A) — F x (x) by A, then by Assumption 2, continuity of F x , 
A — > implies A — > thus the limit is equivalent to 

C Fx>Fy (a + A,b)-C Fx , Fy (a,b) 
lim J- . 

A^O A 

Since with respect to its second argument the copula function and the 
limit are ordinary functions we concentrate on being able to define the gen- 
eralized derivative with respect to the first argument. In particular, for 
any ip e D (W) , given the second argument the value of the functional 
{[p F x> F v)'\ 'V'J = — (Cf x ,f v , ^>') ■ This implies that we can define the value 
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of the functional F y \ x on D (W) by 

(F y]x ,if)) = -(C Fx>Fy1 ifj') = - [ F x , y (x,y)iP'(F x (x))dF x (x). (11) 



Thus we can define the conditional distribution F y \ x as a generalized function 
in the space D* (W) . 

When d x = 1 this is an exhaustive representation. When d x > 1 it may 
be advantageous to consider a derivative with respect to a d x — dimensional 
argument. Consider the conditioning vector, x , component-wise, and con- 
sider the multivariate copula function, Cp xl ,...,F x ,F y (Fxi, F Xd , F y ) ; to sim- 
plify notation we drop the subscript to denote it simply by C. Then by 
a similar argument for any if> E D (W) where W = (0, l) dx we obtain 
{Fy\x,ip) = (-l) dx (C,d d ^) = 

(-1)^ J ... J F x , y (x,y)d^ (F X1 , ...,F Xdx (x dx ))dF Xl ( Xl ) ...d F, .„(.,;,/). 

(12) 

Remark 1. Similarly to Corollary 1, the generalized function F y \ x can be 
extended as a linear continuous functional from being defined on the space 
D (W) of infinitely differentiable functions to a linear continuous functional 
defined by (II ip on any space Dk (W) with k > 1 and for (??) to Dk (W) for 
the corresponding W and k > d x . 

Remark 2. If the function C were suitably differentiable the functional 
(F y \ x ,ip) would be defined for any continuous if) with bounded support, that 
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is on the space D (W) by (d dx C (..., .) , ip) : 




■■■Fx dx i F y )ij}(F xl , ...F Xdx 



)dF Xl ...dF Xdx . (13) 



In the y argument the conditional distribution is an ordinary function so 
here y is considered just as a parameter of the generalized function. However, 
the definition of F y \ x in ( ITT]) can be extended to a functional for functions 
defined on the product space; for any tj) xy = i/j x (xi, ...Xd x )if> y (yi, yd y ) £ 
D((0, l) dx ) x D(R d y) define the value of the functional by (F y[x , ip xy ) = 



(-1)4* / ... / F(x,y)d d ^ x (F Xl} ...F x J^ y (y u ... } y dy )dF Xl ..AF Xd J yi ...dy dy . 



To define conditional density f y \ x as a generalized function one would have 



(-1)*^* J - J F x , y (x,y)d d *iP x (F Xl , .../•:,,, )</''(•(//:. ...,y dy )dF Xl ( Xl ) ...dF Xdx (x dx )d yi ...dy dy . 

(14) 



In general, the conditional distribution and conditional density depend on 
the conditioning variables, x, via the marginals, F x ; considering generalized 
functions makes this explicit. 

There are cases when the conditional distribution and conditional density 
are defined on the Euclidean space R dx . This is possible if the distribution 
function F x is strictly monotone in each argument; then the corresponding 
generalized density function is positive, moreover, since a monotone function 
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is a.e. differentiable, eft* F Xjy (x,y) and f x {x) = d dx F x (x) exist a.e. and 
f x (x) > 0. When the density f x is a continuous function the conditional 
distribution can be represented as a functional on a function space on R dx 
that can be derived from the general representation above in D* (W) . 

Indeed, any distribution function, F (x, y) , where we focus on the ar- 
gument x, via the copula representation can be considered as a functional 
on D (W) . Let $ denote the class of such distribution functions, then $ C 
D* (W) . Moreover the representation (fT2|) demonstrated that any conditional 
distribution F\ x (x, y) also defines a linear continuous functional on D (W) . 
Denoting by Q\ x the class of conditional distributions we thus have shown 
that C D* (W) . By the remark, we can relax the differentiability condi- 
tions and consider Q\ x C D* k (W) ; when the distribution function is differ- 
entiable in x, we set k = 0. On the other hand, then a continuous density 
function, f x > exists and the conditional distribution can be represented 
by an ordinary function 9 ; denote by $ c the class of distributions 

that are continuously differentiable in x with f x > on R dx , and by <& c \ x the 
class of corresponding conditional distributions. Then $ c C D^(R dx ) and as 
well $ c |a; C Dq (R dx ) , where the space D (R dx ^J is the space of continuous 
functions with bounded support in R dx . Since $ c | x C $u, any conditional 
distribution that exists in the ordinary sense and thus is in <& c \ x , has two 
representations: one as a functional on D (W) defined above and the second 
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functional on D (R dx ) that provides for any %p E Dq (R 4 *) 

(i^) = J ... J d ^ F ^ y) ^{x)dx x ...dx dx . (15) 

The following lemma shows that the two representations are compatible and 
each can be easily obtained from the other. 

Lemma. Suppose that F XtV G $ c . Then the value of the functional given 
by ( |T3l) for ip £ Dq (0, l) dx is the same as the value of the functional given 
by f|T5|) for %jj (x) = f x (x)tp (F(x)) E D (R dx ) ; and vice versa: given ( TT~5|) 
the value of (ITS]) for ip (F Xl , F Xd J) = j^f~~^ji where X{ is uniquely 
determined by the value of F x .: X{ = F~^{F X . (xj)), is the same. 

Proof. For any if) G D (0, l) dx define Tp on by Tp (x) = f x {x)ip [F{x)) , 
then (F y \ x ,ip) defined by ( fl2l) by differentiability of in x is equal to 

{F y \ x ^ = J ... J F f'J x \ ,V ^{x)dx 1 ...dx da . 

Denote by Z{ the value F Xi (x), i — 1, g^; then (for clarity we subscript the 
operator d by the variable(s) with respect to which we differentiate): 

dfF x , y (f xi 1 (z 1 ), ...,F- d l (z dx ),y) f x (x) = d dx F x , y (x,y). 
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The r.h.s. of (I12p provides 



(-1)«- J ... J F x , y (F-^zt), F'l (z dx ) , y) 8***1; (z h z dm ))dz 1 ...dz dm 
J ... J d**F x>y (F- 1 {z 1 ),...,F- 1 x {z dx ) ,y^{z 1 ,...z dx )dz 1 ...dz dx 

I " / ^ ^ (.r)' //) ' / (/ '' f ' r,) r '- ( - r '' ))■/'•' (*) dx 1 ...dx dx , 



and writing this in more concise notation 

'ip(F{x))j x (x)dx = / yj(x)dx. 



fx(x) ' J f x (x 

Since / x is continuous, then ^(x) = ip(F(x))f(x) is continuous on i? d;r . 
For an arbitrary ip £ D consider 

Do the transformation, then 

(W) = J d z F x>y (F-\z),y) f^lify d*. 

Define a continuous function ijj (F X1 , F Xd J) = f^ x \'"' X Xd \ on (0, ^) dx , then 
this equals ( TT3l) . 



Suppose now that F x is absolutely continuous with continuous density 
function, f y \ x ; then the support of the density function is an open set in R dx , 
S y \ x . The Lemma applies by considering ip (x) = f x (x)ip (F(x)) G D (S y \ x ) 
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m place of D (R d *) 



5 Limit properties of kernel estimators of con- 
ditional distribution in generalized functions 

Consider the usual kernel estimator of conditional distribution; typically its 
limit properties are available under smoothness conditions on the distribution 
(see, e.g. Li and Racine, 2007). Here the estimator is examined in the space 
of generalized functions without any restrictions placed on the distribution 
beyond Assumption 2 (continuity of F x ). 

Recall the usual kernel estimator of conditional distribution: 



where G is the integral of a kernel function G similar to K that satisfies 
Assumption 1 on R dy and K satisfies Assumption 1 on R dx . Sometimes G is 
assumed to be the indicator function I(w > 0). 

To simplify exposition we assume that each component of vector x is 
associated with the same (scalar) bandwidth parameter h; it is not difficult 
to generalize to the case of distinct bandwidths. 

Theorem 3. Suppose that Assumption 1 on the kernel K and either 




(17) 



(16) 
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a similar assumption for G holds, or G is the indicator function, the band- 
width parameter h = cn~ a , where a < \ and Assumption 2 holds. Then for a 
random sample {(xi, yi)KLi ^ e estimator F y \ x (x,y) as a generalized random 
function on D (W) converges to the conditional distribution generalized func- 
tion F y \ x defined by (fTTj) at the rate n~^; the limit process for y/n(F y \ x — F y \ x ) 
on D (W) is given by a ip G D (W) indexed random functional, Q y \ x with 
(Qy\x,ip) = 

(-1)*= [ J F xy (d d *d d *iP)(F x )U x dF x + J F xy (d d ^)(F x )dU x + J (d d ^)(F x )U xy dF x 

where U x ,U xy are Brownian bridge processes with dimension d x ,d y + d x , 
correspondingly; as a generalized random process the limit process Q y \ x of 
\fn{F y \ x — F y \ x ) is Gaussian with mean functional zero and covariance bilin- 
ear functional C, given for any ipi,ip 2 by 

(C, (ip v ip 2 )) = cav[(Q v \ x , ipi), (Q y \ x , V 2 )- 

Proof. See Appendix. 

This result is general in that the root-n convergence holds here regardless 
of whether the marginal density exists. If it does exist the result could be 
restated for conditional distribution as a generalized function on D (R dx ^J 
by (US]). 

Remark 3. Sometimes for a singular distribution the kernel estimator 
f x (x) diverges at a specific rate, as e.g. in Lu (1999) where at points x 
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in support of density f x (x) = h d ~ x b + o p with some b > and d = 

< 1. In the univariate case this is discussed in Example 5 in Zinde- Walsh 
(2008), where for the Cantor distribution it is noted that though f x (x) may 
diverge, h 1 ~ d f x (x) is bounded and bounded away from zero. Then, even 
though the limit density does not exist by rescaling it is possible to establish 
the convergence rate of the estimator of the conditional distribution as a 
functional on D {R dx ^j ; the rate is rC^-h 1 ~ d and is faster than the root-n 
rate. 



6 Conditional moments 

Consider now a conditional moment of a function g (y) , of y G R dy : E y \ x g{y) = 
m (x) , with m (x) measurable with respect to F x . 

When the conditional density function exists in L x we write m (x) = 
f g(y)fy\x(x,y)dy (assuming that the integral exists). As a generalized func- 
tion (in x) m (x) can be presented on the space D (W) ; W — (0, lf x by the 
value of the functional for ip : 



m,ip) — / m(x)ip(F(x))dF(x) 



g{y)f y \x(x,y)dy 



il>(F(x))dF(x). 



To give meaning to (to, ip) regardless of the existence of the conditional 
density as a function, f g(y)f y \ x (x, y)dy needs to be characterized as a gener- 
alized function on D (W) . To make this possible for an arbitrary distribution 
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on (x, y) that satisfies Assumption 2 the class of functions g is restricted. 

Assumption 3. The function g is continuously differentiable with respect 
to the differentiation operator d dy . 

Any polynomial function satisfies Assumption 3, and thus conditional 
mean of y, or conditional variance (if they exist) can be considered. If the 
function were not to satisfy the differentiability assumption, the class of 
distributions would need to be correspondingly restricted. 

Consider D (R dy ) and a locally finite partition of unity on R dy by a set 
of suitable functions, "bump" functions from D (R dy ^ : {ip u } , where ip v G 
D (R dy ) , ip > and ^ v i) v (y) = 1; also any y can belong to support of only 
a finite number of ip v . See e.g. Gel'fand and Shilov, 1964, v.l, p. 142 for a 
construction. 

Then define (gf y \ x ,ip u ) = J g(y)f y \ x (x,y)ip v (y)dy; under Assumption 3 
this expression is (as usual integrating by parts and using boundedness of 
support of ip v ): 

j g(y)f y]x (x,y)4> v (y)dy=(-l) dv J F ylx (x,y) d dy (g (y) ^ (y)) dy. (18) 
This expression represents a generalized function on D (W) given for any 
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ip e D (W) by 



9(y)fy\x(x,y)^ v (y)dy,ip) 
_!)<** J J Fylx ( X; y) &v ( g ( y ) ^ ( y )) dy^(F(x))dF(x) 

_!)*+<. /" [ Fxiy ( X ,y) d dy(g(y)^ v (y)) d y(d d *lP)(F(x))dF(x) 



Because the supports of ^ and of ^ are bounded and the function being 
integrated is bounded, the integral exists. 

Assumption 4. (Existence of conditional moment). For a partition of 
unity, {ip v } > the sum 



^v(j g(y)f y \ x (x,y)ip v (y)dy,i/j) (19) 

converges. 

Then (Tl9l) represents (m (x) , for the generalized function, m (x) = 
s t. / 9(y)fy\x(x, y)iJ v (y)dy, on £> (W) . 
Thus 

where the sum converges. 
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Then 



I g(y)f y \ x (x,y)ip v (y)dy = \ g(y)f y]x (x,y)E v ip v (y)dy 

g{y)fy\ x {x,y)dy, 



in other words interchanging the order of integration and summation is per- 
mitted for the terms on the left-hand side of (|18[) under Assumption 4. How- 
ever, this is not the case for terms on the right-hand side of ffl8|) . For example, 
if 9 (y) = y, we have 3 d * (g (y) tj) v (y)) = yi// v +iff v) and E„ (d d y (g (y) if) v (y))) = 
1, but f F y \ x (x, y) dy may not exist. 
Thus (gfy\ x ,tp^u) = 



\d x +d v 



F x , y (x,y)d d ^ (F x (x))d d y[g(y)^ v ( yi , ...,y dy )}dF x (x)d yi ...dy a 

(20) 

Then the conditional moment m as a generalized function on D (W) is 
riven by (m, ip) = 



E„(-l)^ / ...J F x , y {x,y)d d *i>(F x {x))d d «[g(y)i> v ( yi ,...,y^^ 

(21) 

with any {ip v } representing a partition of unity on R dy by functions from 
D (R d v) . 
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7 Limit properties of kernel estimators of con- 
ditional mean function. 

Suppose that with d y = 1 the conditional mean function m (x) = E y \ x y exists; 
by f|2T|) it then can be represented as 

(m,<0) 

= ^ v {-l) d * +l J ... J F x<y (x,y)d d ^(F x (x))[yij'M+iJ v (y)}dF x (x)d yi ... 
Consider the usual kernel estimator 



m[x\ 



that can also be represented as 

J 'yfx, y (x,y)dy = E v J yf X} y(x } y)jj v (y)dy 

fx(x) f x (x) 

Then for any continuously different iable ip{x) 



m,ip) = J y— ^{x)dx 



f Jd^F x , y (x,y)^ v (y)]dy ^ 
—h v / 7. ip{x)dx 

J fx{x) 
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Consider if) and if) = if)f ; by the Lemma (rh, if)if) v ) = 

{-l) dx+1 j J F x , y (x,y)d d ^^ x (x))^[yiPM}d(F x (x))dy (23) 
= (~l) dx+1 j J F XtV {x,y)^{P x {x)) [yif>' v {y) + if) v {y)]d(F x {x))dy. 

Assumption 5. The conditional variance a 2 (x) = E y \ x y 2 defines a 
generalized function on D (W). 

Assumption 5 implies that for any if) G D (W) the value of the func- 
tional (o~ 2 ,ip) = f cr 2 (x) if) (F x (x)) dF x (x) is always bounded; this is reqired 
to bound the variance for the limit process. By (j2"Tj) for a partition of unity, 

(a 2 , if)) = Z v (-l) d * +1 J J F x<y (x,y)d d *if)(F x (x))(y 2 i) v (y))' dF x (x) dy. 

Theorem 4. Suppose that Assumptions 1-5 hold, the bandwidth param- 
eter h = cn~ a , where a < \. Then the estimator m{x) for a random sample 
{.{ x iiVi)Yi=i as a generalized random function on D (W) converges at the rate 
n~ 2 to the generalized function m that provides (1221) ; the limit process for 
yfn(rh — m) on D (W) is given by a if) G D (W) indexed random functional 
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Q m with (Q m ,^) = 

K(-l) dx+1 J ■■■{ J U x , y ^{F x (x))dF x (x) 

+ J F x , y (x,y) (d d *) 2 ^(F x (x))U x dF x (x) 

+ J F x , y (x,y)d d ^(F x (x))dU x }[y^ v (y)+^ v (y)]d yi ...dy dy , 

where U x ,U XtV are Brownian bridge processes with dimension d x ,d x + 1, 
correspondingly; as a generalized random process the limit process Q rn of 
\/n{m — m) is Gaussian with mean functional zero and covariance bilinear 
functional C, given for any ipi,ip 2 by 

(C, (^1,^2)) = «w[(Qm,^i), (Q m ,ip 2 )- 
Proof. See Appendix. 

Similarly to the kernel estimator for the conditional distribution the con- 
ditional mean estimator converges at parametric rate as a functional on 
D (W) for any distribution. When a positive conditioning density exists 
it is possible to represent the conditional mean as a functional on D , 
by the same arguments as in the Lemma. In the case of Remark 3 a similar 
rescaling provides a faster convergence rate for the estimator considered as 
a functional on D (R dx ) . 
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8 Conclusion and further questions 

The approach employed here makes it possible to avoid any restrictions when 
defining density, conditional distribution and conditional density as well as 
conditional moments for a smooth function (e.g. conditional expectation or 
second moment). 

The usual kernel estimators converge to the limit generalized functions 
at a parametric rate; the limit process is provided by a Gaussian process 
in the space of generalized functions, that is a Gaussian process indexed by 
well-behaved functions from the appropriate spaces. 

The results here were based on a random sample of observations to sim- 
plify exposition; extension to stationary ergodic or mixing processes can be 
obtained. Further extensions to relax homogeneity and independence are a 
subject of future research. 

The limit results imply that with a judicial selection of indexing functions 
one could use the kernel estimators for inference in very general situations; 
this investigation is mostly left for future research. 

9 Appendix. 

Proof of Theorem 2. 

Define a generalized function e n hj such that the value of the functional 
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for ijj G G is 



{e nhj ,i!) 



,x — x 



Uh, 



K(—-^)il>(x)dx-(f,il>) 



h 



and consider e nh = i Y^j=i e nhf this generalized function provides f — f. 

The expectation functional Ee hn gives the generalized bias of the estima- 
tor /, Bias ^/J , see (|HJ) • 

Next to derive the variance functional consider = E(e n hi, i>i){ehnj, V^))- 

For I 7^ j by independence 



= E(e nhh ^ 1 )(e nhj ,tjj 2 ) = E(e nhh ^ 1 )E(e nhj ,tp 2 ) 



For Z = j 



where 



E{e nh j{x),ij;{){e nh j{x),'il)^) 



x 



A_ K (^-^)tP 2 (x)dx - {f,ip 2 ) 



dF( Xj ) 



T 1 . 
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1 



n/i 



1 ^/^j # 



y )V , 2( x )^ x J dF(xj) 
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andT* = 




^-KC-^mx)dx}dF{ X] ) x (fM 



+(/,Vi)x(/,V 2 )- 



For every vector /i and s = 1, 2 



( IX' — X 

I — ) ip s (x)dx = I K (w) ip s (xj — hw) dw. 



It follows by substituting into T?- and expanding ip s that T?- = —E^p 1 (x) E?p 2 (x) + 



hRo. 



Similarly, 



T 1 . 
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-K 1 



h 



tp 1 (x)dx 



1 _ K I * 



UK 



h 



i/j 2 (x)dx J dF(xj) 



K( w)Mxi -h,) iw I K( w)MXl -h^UF( Xi ) 




K(w)dwtp 1 (xj) — h j K (w 

K{w)dwip 2 {xj) — h J K (w) 
= Eip 1 (x) ip 2 (x) + hRi] 



dw x 



nJ-^^-h^w^ 



dw^j dF(xj) 



where after the change of variable ip s (xj — hw) is expanded around the point 
Xj. Next we establish that \Ri\ < oo, jit^l < oo. 
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Indeed, 



(x - hw) = i> s (x) - hEt=i-^ ( x ~ M w *f, s = 1> 2, (24) 



where u> = aw for some < a < 1 and since hi < h and < 1 on support 
of K 

Bib. . Ka 



holds and the right-hand side is uniformly bounded by some B^ s < oo since 
iJ s e D l+k (U) . Thus 

\Ri\< SU P^2 + 5 V> 2 sup^i + hB^B^. 

Similarly, \R 2 \ < oo. 

Combining we get that Tjj = cov (ip 1: tp 2 ) + 0(h) as h — > 0. 
Consider now 



^7rah / j 'Inhj' 



(25) 



Note that here rj nh j = (e nh j — Bias if))- This generalized random function 
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has expectation zero. In the covariance the terms where I ^ j are zero and 

n ~ lE (Vnhj^l)(Vnhj^2) 

= Tjj + Oih), 

and thus converges to cov(ipi, V^)- 

Next (similarly to Zinde- Walsh, 2008) we show that for any set of linearly 
independent functions ■■■,ip m G D with E(ijjf) > the joint distribution 
of the vector 

converges to a multivariate Gaussian. Define similarly the vector ~rf n hj with 
components (Vnhj^i)- Denote by S the m x m matrix with ts component 
{S} ts = (C, (ip t ,ip s )) where the functional C is given by (jHj). Denote by S n 
the covariance matrix of ~rf n hj- By the convergence results for Ty, S n — > E. 
Since the functions tp^, ...,ip m are linearly independent and E(ijjf) > the 
matrix S and thus S n for large enough n is invertible. Define £ nh j to equal 

Sn l/2 lfnhj, then Sn l/2 lfnhj ~ S~ 1/2 lf nhj -+ p 0. 

Next, consider an m x 1 vector A with A A = 1. The random variables 
nhj are independent with expectation 0, var J2 X£ nhj = 1; they satisfy the 

Liapunov condition: E X'C n hj — > for 5 > since the kernel function 

is bounded with finite support. Thus 
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and by the Cramer- Wold theorem convergence to a limit Gaussian process 
for S n 1 lfnh an d thus for S~^ 2l ffhn follows. ■ 
Proof of Theorem 3. 

Since for a smooth kernel F(x, y) G $ c by the Lemma the value of the 
functional for ip G D (0, l) dx , (F y \ x , ip) is the same as (F y \ x , ■?/>), with the latter 
defined by ( TTBl where -0 = f x ip C^x) ■ Thus for any ip G D (0, 1) : 

(iW) (26) 

\<fc [ l srn ( y~Vi \ U ( X i^ X \ fxda.i. r> f X i ~ X \ \ ( ^ r> ( ' X i ~ X 



More concisely it is (F y \ x ,i/;) = 



(-if* J F x , y (x } y)d d ^ fax)) d fax) 
+ (-l) d * [J F x , y (x,y)d d ^ fax)) d fax)) - J F Xiy (x,y)d d ^ fax)) dF x {x)}. 

Here "hat" indicates empirical distribution function and "tilde" the kernel 
estimated distribution function. By standard arguments the smooth kernel 
introduces a bias; by the usual expansions using differentiability of ip we get 
that for the second order kernel 



(-1)*" [ J F Xiy (ar, y) d d ^ fax)) d fax)) - J F x>y (x, y) d d ^ fax)) dF x (x)} 
O p {h") . 



37 



Represent (-1)*" / F x>y (x,y) d d *4> (P x {x)) d (P x {xj) as 

(-1)*- { J F XtV ^ (F x ) d (F x ) + J F x , y [{d d *d d ^) (F x ) (P x - F x ) + r (P x - F x ^]d (F x ) 
+ f F x , y d d ^ (F x ) d (P x - F x ) + J F x>y {d d *8 d ^) [P x - F x ) d (P x - F x ) 

+ j (F Xy y - F Xt y)d d ^ (F X ) dF X + j (P X7 y ~ ( Fj; ) d ^ - F^ 

+ J {F Xty - F x , y )(d d *d d ^) (f x ) (P x - F x ^j dF x 

+ J {F x , y - F x<y )(d d *d d *ij) (f x ) (P x - F x ) d (P x - F x ) 

where F x represents an intermediate value and takes values in (0, l) dx ; by 
properties of ip e -D (W) the function (d dx d dx ip) (j^x^j is bounded. Then 
^s/|x ~~ ^j/|x) can be expressed as 

Qi> [Vn{F x - F x ), y/n (P xy - F^+n^R (y/n(F x - F x ), Vn (P xy - F xy ^j , 



where 



Qi> (Vn(F x - F x ), y/n (P xy - F xy ^ 
= J F x , y [(d d *4>) (F x )]dy^ (P x ~ F x)+J V^(F x , y - F x , y )[(d d °i>) (F x )]dF x 
+ j F x , y [(d d ^d d ^) (F x )]y/n (P x - F x ) d (F x ) 

and R(.,.) is a bounded function. 

Since the limit process of y/n (^F — F^j is U_, a Brownian bridge, and the 
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function is continuous in its arguments, by Donsker's theorem we can 
express the limit process for y/n (j? y \ x - F y \ x ,^j as (Q y \ x ,tp) = (U x , U xy ) 
by substituting the limit Browning bridge processes for the arguments of 

Qtp (•> •) • 

For any ip 1 ,...,ip l G D (W) the joint limit process for 




is similarly given by the joint process of (U x , U xy ) , Q^ t (U x , U xy ). This 
is a Gaussian process. The mean is zero since is linear in its argu- 
ments and the covariance is given by cov (Qif, 1 (U x ,U xy ) ,Q^ 2 (U x ,U xy )) = 
cov {(Qy\ x , (Q y \x,i J 2)) ■ Existence follows from boundedness of the func- 
tions in the expressions and bounded support of ip. 

By assumption of the theorem h 2 = o(n~^), thus the limit process is fully 
described by Q y \ x . 

■ 

Proof of Theorem 4. 
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For (1231) we obtain 



( " 1)C y J ^y&y^ 9 ^ (Fx^WM + ^v {y)\d(P x {x))dy 

{-l) dx+l { j j F x , y (x, y) d d ^ (F x (x)) [y^' v (y) + ^ (y)]d (F x (x)) dy 

+ J J[F x , y (x,y)-F x>y (x,y)}[yiP' v (y)+^v (y)]d d ^ (F x (x)) d (F x (x)) dy 

+ J j ' F Xty (x,y) (d d *) 2 iP(F x (x)){F x (x)-F x (x)}{y^ v (y)+^ (y)]d (F x (x)) dy 

+ f f F x>y (x, y) d d ^ (F x (x)) [yi// v (y) + ^ (y)]d (P x (x) - F x (x)) dy 



+R}, 



where R combines the remaining terms. Analogously to the proof of Theorem 
3 \/n (m — m, tpip v ) is represented as 

Qw, (yn [F x - F^j , Vn (P xy - F xy )}+n~^R (yn(F x - F x ), y/n (P xy - F x% 

The limit process for the first functional is expressed via a value of the func- 
tional for Brownian bridges, 



Q^ v (U x , U xy ) = J j U x>y [y^ v (y) + ^ {y)]d d ^ (F x (x)) d (F x (x)) dy (28) 
+ F x<y (x,y) (d d *) 2 iP(F x (x))U x {yij'M + ij v (y)]d (F x (x)) dy 
+ [ [ F x , y (x,y)d d *i;(F x (x)){y^M + 4>Ay)}d(U x )dy. 



This process is Gaussian with mean zero; summing over v we get a 
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zero mean limit process, (Q m ,i/j) = ^ v Qtpip v (U x , U xy ) . We need to verify 
that the bilinear covariance functional cov ((Q m , V'i), (Qm, ^2)) i s well-defined 
(bounded) for any ifj l ,ip 2 - 

Since expectation of Q m is zero 

|cot;((Q m ,^ 1 ),(Q m ,^ 2 ))| < [£(Q m ,^) 2 £(Q m ,^ 2 ) 2 p, 
E(Q m ^) 2 = E(i: v Q^ v (U Xl U xy )) 2 . 

Thus it is sufficient to consider variances for some ip. 

The representation in ff28l) involves three terms, it is sufficient to show 
that the variance of the sum of each type of term over all v is bounded. 

Recall that here cov(U Zl , U Z2 ) = F(z) — F (z\) Ffa), where z = Z\ A z^- 

Start with the first term in ([28]) and consider its variance. 

Evaluate 
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E { J - J U Xuy JJ X2 ,y 2 [yi^' Vl {yi) +ip vi (yi)][y 2 ip' V2 {y 2 ) + ip V2 {y 2 )}dyidy2 
■d d ^ (F x ( Xl )) d (F x ( Xl )) d d ^ (F x (x 2 )) d (F x (x 2 ))} 
= Ei — Eip with 

/r r ryi 

... J F(x 1 ,y 1 )[y 1 ^ Vi (y 1 )+^ Vl (y 1 )] J [y 2 ip' V2 (y 2 ) + ip V2 (y 2 )]dy 2 

^^(F^x^diF^x,)) [ 1 d d ^(F x (x 2 ))d(F x (x 2 ))} 



dyi 



and Ei )2 = E\E 2 where for i — 1, 2 

For Ei integrating we get (dropping the subsript 1 on variables) 

J ... J F(x, y)[y^' Vi (y)^ V2 (y) + y^ Vl (y)^ V2 (y)]dy ■ l -d d ^ 2 (F x (x)) dF(x). 

By construction of the partition of unity | T^ip' Vl {y)ip V2 (y) | as well as ^tp vi (y)^ J v 2 iv) 
are uniformly bounded, say both by some B. We get 

J ... J F(x,y)[y 2 iP' vl (y)i; V2 (y) + y^ Vl {y)i> V2 {y)]dy ■ X -d d ^ 2 (F x (x)) dF(x) 
< ^[(E lx (y 2 ),^) + \(E lx y,^)\\. 

Note that i) 2 G D(W). By Assumption 5 then this contribution to the 
covariance is bounded. 
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Similarly boundedness of the othe contributions from all the terms into 
the covariance can be obtained. By the condition h 2 = o{n~^) on the band- 
width the bias does not affect the limit process. 
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